Language-Independent Methods for Compiling Monolingual Lexical Data

نویسندگان

  • Christian Biemann
  • Stefan Bordag
  • Gerhard Heyer
  • Uwe Quasthoff
  • Christian Wolff
چکیده

In this paper we describe a flexible, portable and languageindependent infrastructure for setting up large monolingual language corpora. The approach is based on collecting a large amount of monolingual text from various sources. The input data is processed on the basis of a sentence-based text segmentation algorithm. We describe the entry structure of the corpus database as well as various query types and tools for information extraction. Among them, the extraction and usage of sentence-based word collocations is discussed in detail. Finally we give an overview of different applications for this language resource. A WWW interface allows for public access to most of the data and information extraction tools (http://wortschatz.uni-leipzig.de).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DanNet: the challenge of compiling a wordnet for Danish by reusing a monolingual dictionary

This paper is a contribution to the discussion on compiling computational lexical resources from conventional dictionaries. It describes the theoretical as well as practical problems that are encountered when reusing a conventional dictionary for compiling a lexical-semantic resource in terms of a wordnet. More specifically, it describes the methodological issues of compiling a wordnet for Dani...

متن کامل

Early Phonological and Lexical Development of a Farsi Speaking Child: A Longitudinal Case Study

The present study aims at the description and analysis of the phonological and lexical development of a child who is acquiring Farsi as his first language. The child's language production at the holophrastic stage of language development, mainly single words, is observed and recorded  longitudinally for nearly seven  months since he was 16 months old until he turned 23 months. An attempt is mad...

متن کامل

Lexical quality and executive control predict children’s first and second language reading comprehension

This study compared how lexical quality (vocabulary and decoding) and executive control (working memory and inhibition) predict reading comprehension directly as well as indirectly, via syntactic integration, in monolingual and bilingual fourth grade children. The participants were 76 monolingual and 102 bilingual children (mean age 10 years, SD = 5 months) learning to read Dutch in the Netherl...

متن کامل

The Effect of Bilingualism/ Monolinguals on L2 Working Memory Capacity and Verbal Intelligence

Issues related to bilingualism and the effects which might have on language learners’ cognitive and meta-cognitive variables have attracted the attention of a couple of researchers in the field of Second Language Acquisition (SLA).Since a couple of decades ago, there has been a plethora of studies on cognitive and metacognitive differences between bilinguals and monolinguals. However, the impac...

متن کامل

Structural Properties Of Lexical Systems: Monolingual And Multilingual Perspectives

We introduce a new type of lexical structure called lexical system , an interoperable model that can feed both monolingual and multilingual language resources. We begin with a formal characterization of lexical systems as “pure” directed graphs, solely made up of nodes corresponding to lexical entities and links. To illustrate our approach, we present data borrowed from a lexical system that ha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004